HW3-drafting-viz

Author

Kristin Art (she/her)

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fold = TRUE)
library(tidyverse)
library(here)
library(sf)
library(patchwork)
library(ggspatial)
library(cowplot)
library(plotly)
library(ggpubr)
library(gridExtra)
library(treemapify)
library(treemap)
library(RColorBrewer)
library(scales)
library(paletteer)
library(ggtext)
library(ggpattern)
library(showtext)


font_add_google(name = "Montserrat", family = "mont")
showtext_auto()

Description

I plan to pursue option 2 for HW #4. My research question has changed to the following:

How does potential supply and demand of green hydrogen fuel in 2030 vary across space in California?

Within this overarching question, I will answer the following questions using my infographic components:

  1. What is the spatial distribution of potential green hydrogen supply and demand for hydrogen fuel from the transportation sector in 2030?

  2. How does potential supply and demand for hydrogen fuel compare within and between regions in 2030? Will individual regions be able to produce enough supply to meet local demand amounts?

  3. Will individual counties in Southern California be able to produce enough supply to meet local demand amounts in 2030? Which counties will be net importers or net exporters of hydrogen fuel?

I have decided to include data from my MESM Group Project (GP) for this assignment. I have used output data from our GP that contains the following variables:

  • demand_h2_kg_d, which is the amount of hydrogen in kg per day that will be demanded at each projected demand location. These values correspond to point geometries for potential hydrogen re-fueling stations that were developed through a comprehensive study by the UC Davis Institute of Transportation Studies (Fulton et al. 2023).

  • supply_h2_kg_d, which is the amount of hydrogen in kg per day that can be produced through wind- or solar-powered electrolysis. These values correspond to point geometries that my GP team developed by identifying renewable resource potential, excluding area unsuitable for renewable development, and siting electrolyzer (hydrogen production technology) facilities through an optimization model.

I aggregated these two variables into county and regional totals for plotting purposes by spatially joining the data with county geometries from the US Census Bureau.

Visualization Resources

Below are two visualizations I might borrow from for this assignment.

The first is a visualization created by Cedric Scherer that I found here: https://github.com/z3tt/TidyTuesday/blob/main/R/2021_27_AnimalRescues.Rmd https://github.com/Z3tt/TidyTuesday/tree/main/plots/2021_27. I like layout of this visualization and the use of annotations alongside the visuals. I also like that he colored certain words and used color on the plots to emphasize them. I may adapt from his code to figure out how to size the graphics, add annotations, etc.

The second is a visualization created by Dan Oehm that I found here: https://github.com/doehm/tidytues/blob/main/scripts/2023/week-31-us-states/us-states.R. I like how he annotated with text and may borrow from his code to understand how to add and position annotations. He also used patchwork, included custom fonts, and labeled histogram bars, which I would like to do for one of my figures as well.

Draft Infographic

Here is a rough drawing of the infographic I would like to make. The actual layout of the subplots and text will probably change based on what the components look like after being coded.

Below are drafts of the infographic’s subplots:

# load CA polygon
ca_sf <- spData::us_states %>%
  filter(GEOID == "06") %>%
  st_transform(crs = "EPSG:4326")

# load ca county outlines
counties_sf <- read_sf(here::here("data/census/tl_2022_us_county/tl_2022_us_county.shp")) %>%
  filter(STATEFP == "06") %>% # FIPS code for CA is 06
  janitor::clean_names() %>%
  dplyr::select(countyfp, geoid, name, namelsad, geometry) %>%
  st_transform(crs = "EPSG:4326")

# load transportation demand
demand_mod <- st_read(here::here("data/fueling_demand_2030/fueling_demand_2030.shp"))
Reading layer `fueling_demand_2030' from data source 
  `/Users/kristinart/Library/CloudStorage/GoogleDrive-kristinart@ucsb.edu/My Drive/Winter 2024/EDS 240/Assignments/art-eds240-HW4/data/fueling_demand_2030/fueling_demand_2030.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 658 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.2788 ymin: 32.55778 xmax: -114.1312 ymax: 42.00949
Geodetic CRS:  WGS 84
# load combined demand electrolyzer outputs from script 05
elect_demand <- st_read(here::here("data/electrolyzers_output/electrolyzer_demand_limited_baseline_uncapped.geojson")) ## THIS INCLUDES GEOMS OF THE ELECTROLYZER POINTS
Reading layer `electrolyzer_demand_limited_baseline_uncapped' from data source 
  `/Users/kristinart/Library/CloudStorage/GoogleDrive-kristinart@ucsb.edu/My Drive/Winter 2024/EDS 240/Assignments/art-eds240-HW4/data/electrolyzers_output/electrolyzer_demand_limited_baseline_uncapped.geojson' 
  using driver `GeoJSON'
Simple feature collection with 116 features and 17 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -124.0569 ymin: 32.70345 xmax: -114.6533 ymax: 41.88911
Geodetic CRS:  WGS 84
# load combined renewables electrolyzer outputs from script 05
elect_renewables <- st_read(here::here("data/electrolyzers_output/electrolyzer_renewables_limited_baseline_uncapped.geojson")) ## THIS INCLUDES GEOMS OF THE ELECTROLYZER POINTS
Reading layer `electrolyzer_renewables_limited_baseline_uncapped' from data source `/Users/kristinart/Library/CloudStorage/GoogleDrive-kristinart@ucsb.edu/My Drive/Winter 2024/EDS 240/Assignments/art-eds240-HW4/data/electrolyzers_output/electrolyzer_renewables_limited_baseline_uncapped.geojson' 
  using driver `GeoJSON'
Simple feature collection with 90 features and 17 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.9 ymin: 32.7 xmax: -114.9 ymax: 41.6
Geodetic CRS:  WGS 84
# load combined renewables electrolyzer outputs from script 05
elect_renewables2 <- st_read(here::here("data/electrolyzers_output/electrolyzer_renewables_renewablegeoms_limited_baseline_uncapped.geojson")) ## THIS INCLUDES GEOMS OF THE RENEWABLE POINTS
Reading layer `electrolyzer_renewables_renewablegeoms_limited_baseline_uncapped' from data source `/Users/kristinart/Library/CloudStorage/GoogleDrive-kristinart@ucsb.edu/My Drive/Winter 2024/EDS 240/Assignments/art-eds240-HW4/data/electrolyzers_output/electrolyzer_renewables_renewablegeoms_limited_baseline_uncapped.geojson' 
  using driver `GeoJSON'
Simple feature collection with 60 features and 17 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.906 ymin: 32.65 xmax: -114.669 ymax: 41.61
Geodetic CRS:  WGS 84
# spatially join county geoms to demand and supply data ----
demand_counties <- demand_mod %>% st_join(counties_sf, join = st_intersects)
supply_counties <- elect_renewables2 %>% st_join(counties_sf) # for supply geoms based on renewables
# supply_counties <- elect_renewables %>% st_join(counties_sf) #for supply geoms based on electrolyzers

# sum county-wide hydrogen in each df ----
h2_demand_counties <- demand_counties %>%
  st_drop_geometry() %>% # drop geometries
  group_by(name) %>% # name = county name
  summarize(demand_h2_kg_d = sum(fueling)) # sum the amount of demand (kg/ d) within each county

h2_supply_counties <- supply_counties %>%
  st_drop_geometry() %>% # drop geometries
  group_by(name) %>% # name = county name
  summarize(supply_h2_kg_d = sum(potential_h2_kg_d)) # sum the amount of supply (kg/ d) within each county

# combine demand and supply data ----
h2_counties <- h2_demand_counties %>%
  full_join(h2_supply_counties, by = "name") %>% # join by county name
  mutate(across(everything(), ~ ifelse(is.na(.), 0, .))) %>% # replace NAs
  rename("county" = "name") %>% # rename name column to county
  mutate(difference = abs(supply_h2_kg_d - demand_h2_kg_d)) %>% # calculate difference between supply and demand amounts
  arrange(desc(difference)) # arrange in order based on difference

# define counties within each state region. based on https://ww2.arb.ca.gov/lcti-central-region  ----

coastal_ca_counties <- c("Alameda", "Contra Costa", "Marin", "Monterey", "Napa", "San Benito", "San Francisco", "San Mateo", "San Luis Obispo", "Santa Clara", "Santa Cruz", "Sonoma", "Solano")

northern_ca_counties <- c("Alpine", "Amador", "Butte", "Calaveras", "Colusa", "Del Norte", "El Dorado", "Glenn", "Humboldt", "Lake", "Lassen", "Mendocino", "Modoc", "Nevada", "Placer", "Plumas", "Sacramento", "Shasta", "Sierra", "Siskiyou", "Sutter", "Tehama", "Trinity", "Yolo", "Yuba")

central_ca_counties <- c("Fresno", "Inyo", "Kern", "Kings", "Madera", "Mariposa", "Merced", "Mono", "San Joaquin", "Stanislaus", "Tulare", "Tuolumne")

southern_ca_counties <- c("Imperial", "Los Angeles", "Orange", "Riverside", "San Bernardino", "San Diego", "Santa Barbara", "Ventura")

# add regions and difference groupings to df ----
h2_counties <- h2_counties %>%
  mutate(
    region = case_when(
      county %in% northern_ca_counties ~ "Northern CA",
      county %in% southern_ca_counties ~ "Southern CA",
      county %in% central_ca_counties ~ "Central CA",
      county %in% coastal_ca_counties ~ "Coastal CA"
    ),
    difference_description = case_when(
      (supply_h2_kg_d - demand_h2_kg_d) > 0 ~ "more supply",
      (supply_h2_kg_d - demand_h2_kg_d) < 0 ~ "more demand"
    )
  )

# add county geometries back to full df ---
h2_counties_sf <- left_join(counties_sf, h2_counties, by = c("name" = "county")) %>%
  mutate(region = ifelse(name == "San Francisco", "Northern CA", region))

# define counties with highest demand and production ----
top_5_demand <- h2_counties %>%
  slice_max(order_by = demand_h2_kg_d, n = 5)

top_5_supply <- h2_counties %>%
  slice_max(order_by = supply_h2_kg_d, n = 5)

# count how many counties have more supply than demand and vice versa ----
county_sum <- h2_counties %>%
  group_by(difference_description) %>%
  summarise(count = n())
# define color palettes, from coolors
# n = 4
pal_regions <- c(
  "#64865B", "#664C5E", "#4F7D7A", "#D19180"
)

# n = 8
pal_regions2 <- c(
  "#5B656C", "#664C5E",
  "#719B82", "#4F7D7A",
  "#86A77D", "#64865B",
  "#9C6F6F", "#D19180"
)

# n = 8 ordered for half circle plots. should probably  write a dictionary for region/ type if use half circle area plots
pal_regions3 <- c(
  "#5B656C", "#719B82",
  "#86A77D", "#9C6F6F",
  "#664C5E", "#4F7D7A",
  "#64865B", "#D19180"
)

Demand Map

# plot demand sites and amounts ----
p1 <-
  ggplot() +
  geom_sf(data = ca_sf) +
  geom_sf(data = h2_counties_sf, aes(fill = region, shape = "Demand Center"), alpha = 0.5, color = "transparent") +
  scale_fill_manual(
    values = pal_regions,
    breaks = c("Northern CA", "Central CA", "Coastal CA", "Southern CA"),
    guide = "none"
  ) +
  geom_sf(data = elect_demand, aes(size = demand_capacity_served), col = "grey40", alpha = 0.7) +
  scale_size(range = c(2, 5)) +
  geom_sf(data = elect_renewables2, aes(shape = "Production Site"), col = "black", size = 2.5, alpha = 0.7) +
  labs(
    size = "Demand (kg/ day)",
    shape = "Location Type",
    title = "Spatial Distribution of Hydrogen Supply and Demand (2030)",
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5, family = "mont"),
    plot.title.position = "plot",
    plot.subtitle = element_text(size = 14, hjust = 0.5, color = "grey50", family = "mont"),
    legend.position = c(0.8, 0.75),
    legend.background = element_rect(fill = "grey97", color = "grey80"),
    panel.grid = element_blank(),
    axis.title = element_blank(),
    axis.text = element_blank(),
    legend.title = element_text(size = 14, family = "mont", face = "bold"),
    legend.text = element_text(size = 11, family = "mont")
  ) +
  annotate(
    geom = "label",
    x = c(-123.8, -117.8, -122.1, -121.1),
    y = c(39.6, 36.4, 35.9, 33.3),
    label = c("Northern CA", "Central CA", "Coastal CA", "Southern CA"),
    color = pal_regions,
    size = 4,
    color = "black",
    family = "mont",
    fontface = "bold",
    label.size = 1
  ) +
  guides(size = guide_legend(order = 2), shape = guide_legend(order = 1, override.aes = list(fill = c("grey50", "black"))))

# ggsave("figures/demand_map.png",bg="white")
print(p1)

Regional Supply/ Demand Histogram (Vertical)

# pivot longer ----
h2_regions_long <- h2_counties %>%
  rename(Demand = demand_h2_kg_d, Production = supply_h2_kg_d) %>%
  pivot_longer(cols = Demand:Production, values_to = "count", names_to = "type") %>%
  mutate(
    x = case_when(
      type == "Demand" ~ 1,
      type == "Production" ~ 2
    ),
    region = as.factor(region)
  ) %>%
  group_by(region, type) %>%
  summarize(count = sum(count))

# dodged bar chart
p2 <-
  h2_regions_long %>%
  mutate(
    region = factor(region, levels = rev(c("Coastal CA", "Northern CA", "Central CA", "Southern CA"))), # ordered by highest demand
    # mutate(region = factor(region, levels = c("Southern CA","Coastal CA", "Northern CA","Central CA")), #ordered by highest production
    type = factor(type, levels = rev(c("Production", "Demand")))
  ) %>%
  ggplot(aes(x = region, y = count, pattern = type)) +
  geom_col_pattern(fill = pal_regions2, position = "dodge", alpha = 0.9, color = "grey0") +
  scale_pattern_manual(values = c("wave", "none")) +
  scale_pattern_fill_manual(values = c("black", "white")) +
  scale_y_continuous(expand = c(0.02, 0), labels = scales::label_number(scale_cut = cut_short_scale())) +
  expand_limits(y = max(7000000)) +
  geom_text(aes(label = (c(
    "2.5M", "5.9M",
    "1.3M", "1.9M",
    "1.9M", "4.9M",
    "6.1M", "1.8M"
  ))), color = "black", vjust = -0.9, hjust = 0.5, position = position_dodge(width = 0.9), fontface = "bold") +
  labs(y = "", x = "", title = "Daily Hydrogen Production and Demand by Region (2030)") + # Hydrogen (kg/day)
  theme_minimal() +
  theme(
    axis.title = element_text(size = 14, family = "mont"),
    axis.title.x = element_blank(),
    plot.title = element_text(size = 16, hjust = 0.5, family = "mont", face = "bold", margin = margin(b = 10)),
    plot.title.position = "plot",
    # plot.subtitle = element_text(size = 14, hjust = 0.5, color = "grey50"),
    legend.direction = "horizontal",
    legend.position = "top",
    legend.box.just = "center",
    # legend.justification = "center",
    legend.title = element_blank(),
    panel.grid = element_blank(),
    axis.text.x = element_text(size = 12, family = "mont"),
    axis.text.y = element_blank(),
    legend.text = element_text(size = 12, family = "mont")
  ) +
  guides(fill = guide_legend(reverse = TRUE))

# ggsave("figures/top5_county_supply_demand.png",bg="white", units = "in", width = 7)
print(p2)

Regional Supply/ Demand Histogram (Horizontal)

# pivot longer ----
h2_regions_long <- h2_counties %>%
  rename(Demand = demand_h2_kg_d, Production = supply_h2_kg_d) %>%
  pivot_longer(cols = Demand:Production, values_to = "count", names_to = "type") %>%
  mutate(
    x = case_when(
      type == "Demand" ~ 1,
      type == "Production" ~ 2
    ),
    region = as.factor(region)
  ) %>%
  group_by(region, type) %>%
  summarize(count = sum(count))

# dodged bar chart
p2_h <-
  h2_regions_long %>%
  mutate(
    region = factor(region, levels = c("Coastal CA", "Northern CA", "Central CA", "Southern CA")), # ordered by highest demand
    # mutate(region = factor(region, levels = c("Southern CA","Coastal CA", "Northern CA","Central CA")), #ordered by highest production
    type = factor(type, levels = c("Production", "Demand"))
  ) %>%
  ggplot(aes(x = region, y = count, pattern = type)) +
  geom_col_pattern(fill = pal_regions2, position = "dodge", alpha = 0.9, color = "grey0") +
  scale_pattern_manual(values = c("none", "wave")) +
  scale_pattern_fill_manual(values = c("black", "white")) +
  labs(y = "", x = "") + # Hydrogen (kg/day)
  scale_y_continuous(expand = c(0.02, 0), labels = scales::label_number(scale_cut = cut_short_scale())) +
  expand_limits(y = max(7000000)) +
  geom_text(aes(label = (c(
    "2.5M", "5.9M",
    "1.3M", "1.9M",
    "1.9M", "4.9M",
    "6.1M", "1.8M"
  ))), color = "black", vjust = 0.5, hjust = -0.3, position = position_dodge(width = 0.9), family = "mont", fontface = "bold") +
  theme_minimal() +
  theme(
    axis.title = element_text(size = 14, family = "mont"),
    axis.title.x = element_blank(),
    # plot.title = element_text(size = 14, hjust = 0.5, family = "mont"),
    # plot.title.position = "plot",
    # plot.subtitle = element_text(size = 14, hjust = 0.5, color = "grey50"),
    legend.direction = "horizontal",
    legend.position = "top",
    legend.box.just = "center",
    # legend.justification = "center",
    legend.title = element_blank(),
    panel.grid = element_blank(),
    axis.text.y = element_text(size = 12, family = "mont"),
    axis.text.x = element_blank(),
    legend.text = element_text(size = 12, family = "mont")
  ) +
  guides(fill = guide_legend(reverse = TRUE)) +
  coord_flip()

# ggsave("figures/top5_county_supply_demand.png",bg="white", units = "in", width = 7)
print(p2_h)

Regional Supply/ Demand Half Circle Plot

# need to rearrange this order
p2b <- h2_regions_long %>%
  mutate(
    x = case_when(
      type == "Demand" ~ 1,
      type == "Production" ~ 2
    ),
    region = as.factor(region)
  ) %>%
  ggplot(aes(x = x, y = sqrt(count), fill = interaction(region, type), pattern = type)) +
  geom_col_pattern(width = 1, show.legend = TRUE) +
  scale_fill_manual(values = pal_regions3, guide = "none") +
  scale_pattern_manual(name = "", values = c("Demand" = "wave", "Production" = "none"), guide = guide_legend(override.aes = list(shape = 21, size = 5, fill = "grey80"))) +
  scale_x_continuous(expand = c(0, 0), limits = c(0.5, 2.5)) +
  scale_y_continuous(expand = c(0, 0)) +
  coord_polar(theta = "x", direction = -1) +
  labs(fill = "") +
  theme_void() +
  theme(
    legend.text = element_text(family = "mont"),
    strip.text = element_text(family = "mont", size = 12),
    legend.position = "top",
    legend.direction = "horizontal",
    legend.justification = "center",
    plot.title = element_text(vjust = 3, hjust = 0.5, family = "mont")
  ) +
  facet_wrap(~region, strip.position = "bottom")
# facet_manual(~region, design = design, strip.position = "bottom")

print(p2b)

# Arrange half circle area plots in diamond shape, not working... order of calling faceted plots is off
# create_strip_plot <- function(p, row) {
#   region_name <- unique(p$data$region)[row]
#
#   strip_plot <- ggplot() +
#     ggtitle(region_name) +
#     theme_void() +
#     theme(
#       plot.margin = margin(0, 0, 0, 0)
#     )
#
#   plot_grid(
#     strip_plot,
#     get_panel(p, row),
#     ncol = 1,
#     rel_heights = c(0.1, 1)
#   )
# }

create_strip_plot <- function(p, row) {
  region_name <- unique(p$data$region)[row]

  strip_plot <- ggplot() +
    ggtitle(region_name) +
    theme_void() +
    theme(
      plot.margin = margin(0, 0, 0, 0)
    )

  legend <- NULL # Default to no legend

  # Include legend only for the first subplot
  if (row == 4) {
    legend <- get_legend(p)
  }

  plot_grid(
    strip_plot,
    get_panel(p, row),
    legend,
    ncol = 1,
    rel_heights = c(0.1, 1, 0.1)
  )
}

blank_plot <- ggplot() +
  theme_void()

# arrange subplots in diamond shape
p2c <- plot_grid(
  blank_plot, create_strip_plot(p2b, 3), blank_plot,
  create_strip_plot(p2b, 1), blank_plot, create_strip_plot(p2b, 2),
  blank_plot, create_strip_plot(p2b, 4), blank_plot,
  ncol = 3
)

print(p2c)

# # Combine p2c and the legend
# legend <- get_legend(p2b)
# combined_plot <- plot_grid(
#   p2c + theme(legend.position = "none"),  # Hide the original legend in p2c
#   legend,
#   ncol = 1,
#   rel_heights = c(1, 0.1)  # Adjust the height for the legend
# )
#
# combined_plot

# # Create a blank plot for the spaces
# blank_plot <- ggplot() + theme_void()
#
# # Arrange the plots with blanks
# plot_grid(blank_plot, get_panel(p2b, 1),blank_plot,
#   get_panel(p2b, 2), blank_plot, get_panel(p2b, 3),
#   blank_plot, get_panel(p2b, 4), blank_plot,
#   ncol = 3
# )
#
#   guides(
#     pattern = guide_legend(
#       override.aes = list(
#         shape = 1,
#         size = 5,
#         fill = c("grey80", "grey80")
#       )
#     )
#   )
# whyyyy
# Arrange half circle area plots vertically.
p2d <- h2_regions_long %>%
  mutate(
    x = case_when(
      type == "Demand" ~ 1,
      type == "Production" ~ 2
    ),
    region = factor(region, levels = c("Northern CA", "Central CA", "Coastal CA", "Southern CA"))
  ) %>%
  ggplot(aes(x = x, y = sqrt(count), fill = interaction(region, type), pattern = type)) +
  geom_col_pattern(width = 1, show.legend = TRUE) +
  scale_fill_manual(values = pal_regions3, guide = "none") +
  scale_pattern_manual(name = "", values = c("Demand" = "wave", "Production" = "none"), guide = guide_legend(override.aes = list(shape = 21, size = 1, fill = "grey80"))) +
  # scale_pattern_manual(values = c("wave", "none"),
  #                      labels = c("Demand", "Production"))  +
  scale_x_continuous(expand = c(0, 0), limits = c(0.5, 2.5)) +
  scale_y_continuous(expand = c(0, 0)) +
  coord_polar(theta = "x", direction = -1) +
  labs(fill = "") +
  theme_void() +
  theme(
    legend.text = element_text(family = "mont"),
    strip.text = element_text(family = "mont", size = 12),
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.justification = "center",
    plot.title = element_text(vjust = 3, hjust = 0.5, family = "mont")
  ) +
  facet_wrap(~region, strip.position = "left", ncol = 1, nrow = 4)

print(p2d)

Southern CA Counties Supply/ Demand Histogram (Horizontal)

# would need to be colored if use
# pivot longer ----
socal_counties_long <- h2_counties %>%
  rename(Demand = demand_h2_kg_d, Production = supply_h2_kg_d) %>%
  pivot_longer(cols = Demand:Production, values_to = "count", names_to = "type") %>%
  mutate(
    x = case_when(
      type == "Demand" ~ 1,
      type == "Production" ~ 2
    ),
    county = as.factor(county)
  ) %>%
  filter(county %in% southern_ca_counties)

# dodged bar chart for top counties
p3 <-
  socal_counties_long %>%
  mutate(
    county = factor(county, levels = c("Santa Barbara", "Ventura", "Imperial", "San Diego", "Orange", "Los Angeles", "Riverside", "San Bernardino")), # ordered by highest demand=
    type = factor(type, levels = c("Production", "Demand"))
  ) %>%
  ggplot(aes(x = county, y = count, pattern = type)) +
  geom_col_pattern(position = "dodge", alpha = 0.9, color = "grey0") + # fill = pal_regions2,
  scale_pattern_manual(values = c("none", "wave")) +
  scale_pattern_fill_manual(values = c("black", "white")) +
  labs(y = "", x = "") + # Hydrogen (kg/day)
  theme_minimal() +
  theme(
    axis.title = element_text(size = 14, family = "mont"),
    axis.title.x = element_blank(),
    plot.title = element_text(size = 14, hjust = 0.5, family = "mont"),
    # plot.title.position = "plot",
    # plot.subtitle = element_text(size = 14, hjust = 0.5, color = "grey50"),
    legend.direction = "horizontal",
    legend.position = "top",
    legend.box.just = "center",
    # legend.justification = "center",
    legend.title = element_blank(),
    panel.grid = element_blank(),
    axis.text.y = element_text(size = 12, family = "mont"),
    axis.text.x = element_blank(),
    legend.text = element_text(size = 12, family = "mont")
  ) +
  scale_y_continuous(expand = c(0.02, 0), labels = scales::label_number(scale_cut = cut_short_scale())) +
  # expand_limits(y = max(7000000)) +
  # geom_text(aes(label = (c(
  #   "2.5M", "5.9M",
  #   "1.3M", "1.9M",
  #   "1.9M", "4.9M",
  #   "6.1M", "1.8M"
  # ))), color = "black", vjust = 0.5, hjust = -0.3, position = position_dodge(width = 0.9)) +
  guides(fill = guide_legend(reverse = TRUE)) +
  coord_flip()

# ggsave("figures/top5_county_supply_demand.png",bg="white", units = "in", width = 7)
print(p3)

Southern CA Counties Differences Dumbbell Plot

custom_number_format <- function(x) {
  ifelse(x >= 1e6,
    gsub(" ", "", sprintf("-%sM", format(round(x / 1e6, 1), nsmall = 1, big.mark = ""))),
    gsub(" ", "", sprintf("-%sk", format(round(x / 1e3), big.mark = "")))
  )
}

socal_counties <- h2_counties %>%
  filter(county %in% southern_ca_counties) %>%
  mutate(
    difference_abbr = (custom_number_format(difference)),
    difference_abbr = ifelse(difference_description == "more supply", "+282k", difference_abbr),
    dotted_line = ifelse(difference_description == "more supply", demand_h2_kg_d, supply_h2_kg_d)
  )


p3b <- socal_counties %>%
  ggplot() +
  geom_segment(aes(x = 0, xend = dotted_line, y = reorder(county, difference), yend = reorder(county, difference)), alpha = 0.4, linetype = "dotdash", linewidth = 0.5) +
  geom_segment(aes(x = demand_h2_kg_d, xend = supply_h2_kg_d, y = reorder(county, difference), yend = reorder(county, difference), color = "Difference"), alpha = 0.4, linewidth = 1) +
  geom_text(
    aes(
      x = (demand_h2_kg_d + supply_h2_kg_d) / 2, # Calculate the midpoint
      y = reorder(county, difference),
      color = "Difference",
      label = difference_abbr
    ),
    position = position_nudge(y = 0.25), # Adjust vertical position if needed
    show.legend = FALSE # Hide legend for the label
  ) +
  geom_point(aes(x = demand_h2_kg_d, y = reorder(county, difference), fill = "Demand"), shape = 21, size = 3) +
  geom_point(aes(x = supply_h2_kg_d, y = reorder(county, difference), fill = "Production"), shape = 22, size = 3) +
  scale_fill_manual(values = c(Demand = "#9C6F6F", Production = "#D19180")) +
  scale_color_manual(values = "black") +
  scale_x_continuous(labels = scales::label_number(scale_cut = cut_short_scale())) +
  labs(y = "", x = "Hydrogen (kg/ day)", title = "Difference between Daily Hydrogen Production and Demand \nin Southern California Counties (2030)") +
  theme_minimal() +
  theme(
    axis.title = element_text(size = 14, family = "mont"),
    axis.title.x = element_text(family = "mont", margin = margin(t = 10)),
    # axis.title.y = element_text(angle = 0, face = "bold", hjust = -0.5, vjust = 1.03, margin = margin(r = -85, l = 40)),  # Increase left margin margin = margin(r = -75, l = 40))
    plot.title = element_text(size = 16, hjust = 0.5, family = "mont", face = "bold", margin = margin(b = 10)),
    plot.title.position = "plot",
    legend.direction = "horizontal",
    legend.position = "bottom",
    legend.justification = "center",
    legend.title = element_blank(),
    legend.text = element_text(size = 12, family = "mont", face = "bold"),
    panel.grid = element_line(color = "grey97"),
    axis.text = element_text(size = 12, family = "mont"),
    # aspect.ratio = 0.9,
    # plot.background = element_rect(color = "#78483F", linewidth = 3)
  ) +
  guides(color = guide_legend(override.aes = list(x = -5, y = 1)))

print(p3b)

# ggsave("figures/faculty_review_figs/county_supply_demand.png",bg="white", unit = "in", width = 10, height = 12)

Infographic (combined plots)

# best so far
plot <- p1 | (p3b / p2)
# print(plot)


plot + plot_annotation(
  title = "How much Hydrogen?",
  subtitle =
    "
    The transportation sector is expected to be the earliest adopter of hydrogen fuel for use in light-, medium-, and heavy-duty fuel cell electric
  vehicles (FCEVs). In California, production of green hydrogen through electrolysis is constrained to areas with sufficient wind and solar
  resources, water availability, and favorable permitting and regulation policies. Here, we explore the spatial match-up of potential green hydrogen
  supply and transportation sector demand in 2030 to get a sense of how important hydrogen distribution and alternative production methods will be.

  ",
  caption = "Source: Kristin Art"
) +
  theme(font = "mont")

# &
# theme(text = element_text('mono'))

Reflection

Answer the following questions:

What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R?

I am finding it challenging to figure out what types of graphs would look alright together in the same graphic. I made some plot options to see how things might look together, but overall it feels like a lot of info to fit onto one page. I also wasted time on some more complicated plots before deciding to just stick with the simpler visualizations above. It also took a lot of time to adjust and fiddle with all of the plot components and aesthetics.

I expect it will be also take a lot of time to fiddle with the layout of the overall infographic. I anticipate some head-banging to get plot sizes/aspect ratios, text sizes/ layouts, white space/ margins, etc. to look alright… particularly because I don’t have much experience using cowplot, patchwork, annotate, etc.

What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?

I will use patchwork, ggtext, ggpubr, and maybe gridExtra to build the rest of my visualization. I am also using ggpattern, which I don’t believe we’ve covered in class yet. ggpattern has pretty good documentation though.

What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?

I would appreciate some coding tips on how to add/ adjust whitespace and plot sub-commponents. Like is it possible to overlay a plot slightly over another one using patchwork or do you have to do an inset instead? How do you make sure all the text sizes are actually readable when certain plots are smaller or bigger - is it just manual adjustment in each subplot or is there a function to help with that once subplots have been stitched together? Is it easier to set a figure size beforehand to make sure the appearance in the R plot window will be similar to the final saved product? What is the best way to save a high quality image? I also saw some good tips in the visualizations from the course resource page, so I will spend some time learning the ways there.

I also have not had a chance to put all the infographic components together with text narration, so any recommendations on layout are also appreciated.